Similarity and Dissimilarity in Treebank Grammars
نویسنده
چکیده
To uncover rules in a treebank grammar which are of dubious quality, we investigate two methods for detecting problematic structures, both based on the same notion of similarity. The first is based on the notion that similar rules should receive the same annotation. The second is based on the idea that rules which are dissimilar to other rules are likely problematic. We show these two methods to be effective in detecting erroneous rules, rules used for ungrammatical or otherwise non-standard constructions, and rules which reveal non-uniform decisions made in the annotation scheme.
منابع مشابه
Linguistic Issues in Language Technology – LiLT
We outline a method of detecting ad hoc, or anomalous, rules in treebank grammars, by exploiting the fact that such rules do not fit with the rest of the grammar. Ad hoc rules are rules used for specific constructions in one data set and unlikely to be used again. These include ungeneralizable rules, erroneous rules, rules for ungrammatical text, and rules which are not consistent with the rest...
متن کاملExtraction of Tree Adjoining Grammars from a Treebank for Korean
We present the implementation of a system which extracts not only lexicalized grammars but also feature-based lexicalized grammars from Korean Sejong Treebank. We report on some practical experiments where we extract TAG grammars and tree schemata. Above all, full-scale syntactic tags and well-formed morphological analysis in Sejong Treebank allow us to extract syntactic features. In addition, ...
متن کاملGrammar Extraction from Treebanks for Hindi and Telugu
Grammars play an important role in many Natural Language Processing (NLP) applications. The traditional approach to creating grammars manually, besides being labor-intensive, has several limitations. With the availability of large scale syntactically annotated treebanks, it is now possible to automatically extract an approximate grammar of a language in any of the existing formalisms from a cor...
متن کاملComparing and integrating Tree Adjoining Grammars
Grammars are core elements of many NLP applications. Grammars can be developed in two ways: built by hand or extracted from corpora. In this paper, we compare a handcrajted grammar with a Treebank grammar. We contend that recognizing substructures of the grammars' basic units is necessary tures and semantic information which are rarely represented in the corpora. lt would be ideal if we could c...
متن کاملTreebank vs. Xbar-based Automatic F-structure Annotation Treebank vs. Xbar-based Automatic F-structure Annotation
Manual, large scale (computational) grammar development is time consuming, expensive and requires lots of linguistic expertise. More recently, a number of alternatives based on treebank resources (such as Penn-II, Susanne, AP treebank) have been explored. The idea is to automatically \induce" or rather read oo (P)CFG grammars from the parse annotated treebank resources and to use the treebank g...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008